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Abstract 

We introduce new Gaussian proposals to improve the efficiency of the standard Hastings- 
Metropolis algorithm in Markov chain Monte Carlo (MCMC) methods, used for the sam¬ 
pling from a target distribution in large dimension d. The improved complexity is 
compared to the complexity 0{d^^^) of the standard approach. We prove an asymptotic 
diffusion limit theorem and show that the relative efficiency of the algorithm can be char¬ 
acterised by its overall acceptance rate (with asymptotical value 0.704), independently of 
the target distribution. Numerical experiments confirm our theoretical findings. 

Keywords: weak convergence, Markov Chain Monte Carlo, diffusion limit, exponential 
ergodicity. 
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1 Introduction 

Consider a probability measure tt on with density again denoted by vr with respect to the 
Lebesgue measure. The Langevin diffusion {xt, t > 0} associated with vr is the solution of 
the following stochastic differential equation: 

dxt = isV log 7r{xt)dt + T^^'^dWt , (1) 

where {Wt, t > 0} is a standard d-dimensional Brownian motion, and S is a given positive 
definite symmetric matrix. Under appropriate assumptions [10] on vr, it can be shown that 
the dynamic generated by (1) is ergodic with unique invariant distribution vr. This is a key 
property of (1) and taking advantage of it permits to sample from the invariant distribution 
vr. In particular, if one could solve (1) analytically and then take time t to infinity then it 
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would be possible to generate samples from vr. However, there exists a limited number of cases 
[13] where such an analytical formula exists. A standard approach is to discretise (1) using 
a one step integrator. The drawback of this approach is that it introduces a bias, because 
in general vr is not invariant with respect to the Markov chain defined by the discretization, 
[26, 15, 1]. In addition, the discretization might fail to be ergodic [24], even though (1) is 
geometrically ergodic. 

An alternative way of sampling from vr, which does not face the bias issue introduced by 
discretizing (1), is by using the Metropolis-Hastings algorithm [11]. The idea is to construct 
a Markov chain {xj, j E N}, where at each step j E N, given Xj, a new candidate yj+i is 
generated from a proposal density q{xj,-). This candidate is then accepted (xj+i = Vj+i) 
with probability a{xj,yj+i) given by 


a{x, y) = min 


A T^{y)(i{y,x) \ 
V ’ TT{x)q{x,y)J 


( 2 ) 


and rejected (xj+i = xj) otherwise. The resulting Markov chain {xj, j E N} is reversible 
with respect to vr and under mild assumptions is ergodic [14, 19]. 

The simplest proposals are random walks for which q is the transition kernel associated 
with the proposal 

y = x + , (3) 

where ^ is a standard Gaussian random variable in and leads to the well known Random 
Walk Metropolis Algorithm (RMW). This proposal is very simple to implement, but it suffers 
from (relatively) high rejection rate, due to the fact that it does not use information about 
vr to construct appropriate candidate moves. 

Another family of proposals commonly used, is based on the Euler-Maruyama discretiza¬ 
tion of (1), for which q is the transition kernel associated with the proposal 

y = X + (/i/2)SV log 7r(x) + , (4) 


where ^ is again a standard Gaussian random variable in R'^. This algorithm is also known as 
the Metropolis Adjusted Langevin Algorithm (MALA), and it is well-established that it has 
better convergence properties than the RWM algorithm in general. This method directs the 
proposed moves towards areas of high probability for the distribution vr, using the gradient of 
logTT. There is now a growing literature on gradient-based MGMC algorithms, as exemplified 
through the two papers [8, 5] and the references therein. We also mention here function 
space MGMC methods [5]. Assuming that the target measure has a density w.r.t. a Gaussian 
measure on a Hilbert space, these algorithms are defined in infinite dimension and avoid 
completely the dependence on the dimension d faced by standard MGMC algorithms. 

A natural question is if one can improve on the behaviour of MALA by incorporating 
more information about the properties of tt in their proposal. A first attempt would be 
to use as proposal a one-step integrator with high weak order for (1), as suggested in the 
discussion of [8]. Although this turns out to not be sufficient, we shall show that, by slightly 
modifying this approach and not focusing on the weak order itself, we are able to construct a 
new proposal with better convergence properties than MALA. We mention that an analogous 
proposal is presented independently in [7] in a different context to improve the strong order 
of convergence of MALA. 

Thus our main contribution in this paper is the introduction and theoretical analysis of 
the fMALA algorithm {fast MALA), and its cousins which will be introduced in Section 3. 


2 



These algorithms provide for the first time, implementable gradient-based MCMC algorithms 
which can achieve convergence in 0{d}^^) iterations, thus improving on the 0{d}^^) of MALA 
and many related methods. These results are demonstrated as a result of high-dimensional 
diffusion approximation results. As well as giving these order of magnitude results for high¬ 
dimensional problems, we shall also give stochastic stability results, specifically results about 
the geometric ergodicity of the algorithms we introduce under appropriate regularity condi¬ 
tions. 

Whilst the algorithms we describe have clear practical relevance for MCMC use, it is 
important to recognise the limitations of this initial study of these methodologies, and we 
shall note and comment on two which are particularly important. In order to obtain the 
diffusion limit results we give, it is necessary to make strong assumptions about the structure 
of the sequence of target distributions as d increases. In our analysis we assume that the target 
distribution consists of d i.i.d. components as in the initial studies of both high-dimensional 
RWM and MALA algorithms [20, 21]. Those analyses were subsequently extended (see for 
example [22]) and supported by considerable empirical evidence from applied MCMC use. We 
also expect that in the context of this paper, our conclusions should provide practical guidance 
for MCMC practitioners well beyond the cases where rigorous results can be demonstrated, 
and we provide an example to illustrate this in Section 5. 

Secondly, our diffusion limit results depend on the initial distribution of the Markov chain 
being the target distribution vr, clearly impractical in real MCMC contexts. The works [4, 12] 
study the case of MCMC algorithms (specifically RWM and MALA algorithms) started away 
from stationarity. On the one hand, it turns out that MALA algorithms are less robust than 
RWM when starting at under-dispersed values in that scaling strategies. Indeed, optimising 
mixing in stationarity can be highly suboptimal in the transient phase, often with initial 
moves having exponentially small acceptance probabilities (in d). On the other hand, a 
slightly more conservative strategy for MALA still achieves 0{d^^‘^) compared to 0{d) for 
RWM. It is natural to expect the story for fMALA to be at least as involved as that for 
MALA, and we give some empirical evidence to support this in the simulations study of 
Section 5. Future work will underpin these investigations with theoretical results analogous 
to those of [4, 12]. From a practical MCMC perspective however, it should be noted that 
strategies which mix MALA-transient optimal scaling with fMALA-stationary optimal scaling 
will perform in a robust manner, both in the transient and stationary phases. Two of these 
effective strategies are illustrated in Section 5. 

The paper is organised as follows. In Section 2 we provide a heuristic for the choice of 
the parameter h used in the proposal as a function of the dimension d of the target and 
present three different proposals that have better complexity scaling properties than RWM 
and MALA. In Section 3, we present fMALA and its variants, and prove our main results 
for the introduced methods. Section 4 investigates the ergodic properties of the different 
proposals for a wide variety of target densities vr. Finally, in Section 5 we present numerical 
results that illustrate our theoretical findings. 

2 Preliminaries 

In this section we discuss some key issues regarding the convergence of MCMC algorithms. 
In particular, in Section 2.1 we discuss some issues related to the computational complexity 
of MCMC methods in high dimensions, while in Section 2.2 we present a useful heuristic for 
understanding the optimal scaling of a given MCMC proposal, and based on this heuristic 
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formally derive a new proposal with desirable scaling properties. 


2.1 Computational Complexity 

Here we discuss a heuristic approach for selecting the parameter h in all proposals mentioned 
above as the dimension of the space d goes to infinity. In particular, we choose h proportional 
to an inverse power of the dimension d such that 

h oc d~'^ . (5) 

This implies that the proposal y is now a function of: (i) the current state x; (ii) the 
parameter 7 through the scaling above; and {in) the random variable ^ which appears in 
all the considered proposals. Thus y = y{x,^]'y). Ideally 7 should be as small as possible 
so the chain makes large steps and samples are correlated as little as possible. At the same 
time, the acceptance probability should not degenerate to 0 as d 00 , also to prevent high 
correlation amongst samples. This naturally leads to the definition of a critical exponent 70 
given by 

70 = inf 7 c : liminf E [a{x, y)] > 0 , V 7 G [ 7 c, 00 ) } . (6) 

7c>0 d —^00 j 

The expectation here is with respect to x distributed according to vr and y chosen from the 
proposal distribution. In other words, we take the largest possible value for h, as function 
of d, constrained by asking that the average acceptance probability is bounded away from 
zero, uniformly in d. The time-step restriction (5) can be interpreted as a kind of Courant- 
Friedrichs-Lewy restriction arising in the numerical time-integration of PDFs. 

If h is of the form (5), with 7 > 70 , the acceptance probability does not degenerate, 
and the Markov chain arising from the Metropolis-Hastings method can be thought of as an 
approximation of the Langevin SDE (1). This Markov chain travels with time-steps h on the 
paths of this SDE, and therefore requires a minimal number of steps to reach timescales of 
0 ( 1 ) given by 

M{d) = . (7) 

If it takes 0(1) for the limiting SDE to reach stationarity, then we obtain that M{d) gives 
the computational complexity of the algorithm.^ 

If we now consider the case of a product measure where 

d 

7r(x) = TTdix) = ZdW , ( 8 ) 

i=l 

and Zd is the normalizing constant, then it is well known [20] that for the RWM it holds 
7 o = 1, while for MALA it holds 70 = 1/3 [21]. In the next subsection, we recall the main 
ideas that allows one to obtain these scalings (valid also for some non-product cases), and 
derive a new proposal which we will call the fast Metropolis Adjusted Langevin algorithm 
(fMALA) and which satisfies 70 = 1/5 in the product case, i.e. it has a better convergence 
scaling. 

^In this definition of the cost one does not take into account the cost of generating a proposal. This is 
discussed in Remark 2.3. 
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2.2 Formal derivation 

Here we explain the main idea that is used for proving the scaling of a Gaussian^ proposal 
in high dimensions. In particular, the proposal y is now of the form 

y = y{x,h) + S{x,h)i , (9) 

where ^ ~ AA(0,1^) is a standard d dimensional Gaussian random variable. Note that in the 
case of the RWM, 

y[x, h) = X, S{x, h) = , 

while in the case of MALA 

y{x, h) = X + (/i/2)SV log 7 r(x), 5(x, h) = . 

The acceptance probability can be written in the form 

a{x,y) =mm{l,ex.p{Rd{x,y))} , 

for some function Rd{x,y) which depends on the Gaussian proposal (9). Now using the fact 
that y is related to x according to (9), Rd{x,x) = 0, together with appropriate smoothness 
properties on the function g{x), one can expand Rd in powers of y/h using a Taylor expansion 

k d 

Rd{x,y) = EE (x ,0 {x,h*,i) . ( 10 ) 

i=l j=l 

It turns out [2] that the scaling associated with each proposal relates directly with how many 
of the Cij terms are zero in (10). This simplifies if we further assume that S = 1,^ in (1) and 
that vr satisfies ( 8 ), because we get for alH G { 1 , • • • , A:}, j G {1, • • • , j}, Cij{x, ^) = Ci{xj,^j) 
and ( 10 ) can be written as 

Rd{x,y) = EE^c,+ . ( 11 ) 

i=i j=i N 

We then see that if C, = 0, for i = 1, • • • , m, then this implies that 70 = l/(m + 1). Indeed, 
this value of 70 yields = 1 and the leading order term in ( 10 ) becomes 

1 - ^ ^ 

~^j= Cm+i {xj,Cj) ■ 
ya 

To understand the behaviour for large d, we typically assume conditions to ensure that the 
above term has an appropriate (weak) limit. It turns out that m+1 is generally an odd integer 
for known proposals, and the above expression is frequently approximated by a central limit 
theorem. The second dominant term in (10) turns out to be (^ 2 (^+ 1 ), although to turn this 
into a rigorous proof one also needs to be able to control the appropriate number of higher 
order terms, from m + 1 to 2{m + 1), as well as the remainder term in the above Taylor 
expansion. 

^We point out that Gaussianity here is not necessary but it greatly simplifies the calculations. 
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2.3 Classes of proposals with 7o = 1/5 

We introduce new Gaussian proposals for which 70 = 1/5 in (7). We start by presenting 
the simplest method, and then give two variations of it, motivated by the desire to obtain 
robust and stable ergodic properties (geometric ergodicity). The underlying calculations that 
show Ci = 0,i = 1,..., m with m = 4 and 70 = 1/5 for these methods are contained in the 
supplementary materials in the form of a Mathematica file. Recall that f{x) = SVlog 7 r(x). 
In the sequel, we denote by Df and D^f the Jacobian {d x d-matrix) and the Hessian (d x d?- 
matrix) of / respectively. Thus {Df{x))ij = and 

D‘^f{x) = [lli{x) ••• Hrf(x)] , where {Hi 

Finally for all x € {S : D^f{x)} € is defined by for i = 1,..., d: 

{S : D^f{x)'^. = trace (S'^Hi(x)) . 

Notice that for S = 1^, the above quantity reduces to the Laplacian and we have {S : T)^/(x)} . 

A/i. 

Remark 2.1. Since by assumption S is positive definite, notice that the Jacobian matrix 
Df{x) is diagonalizable for all x E M'^. Indeed, it is similar to the symmetric matrix 
f{x)T}/‘^ = log 7 r(x)S^/^, and we use that a symmetric matrix is always diag¬ 

onalizable. This will permit us to define analytic functionals of Df{x). 

2.3.1 Fast Metropolis-Adjusted Langevin Algorithm (fMALA) 

We first give a natural proposal for which 70 = 1/5 based on the discussion of Section 2.2. 
We restrict the class of proposal defined by (9) by setting for all x E and h > 0, 

Pl{x, h) = X -\- h g.i{x) + hfipL 2 {x) , ^(x, h) = h^^‘^Si{x) + h?^‘^S 2 {x) . 

By a formal calculation (see the supplementary materials), explicit expressions for the func¬ 
tions 81,82 have to be imposed for the four first term Cfix,^), i E {1,2,3,4}, in (11) 

to be zero. This result implies the following definition for pL and 8 \ 

( 12 a) 
( 12 b) 

We will refer to (9) when pi,S are given by (12) as the fast Unadjusted Langevin Algorithm 
(fULA) when viewed as a numerical method for (1) and as the fast Metropolis-Adjusted 
Langevin Algorithm (fMALA) when used as a proposal in the Metropolis-Hastings framework. 

Remark 2.2. It is interesting to note that compared with Unadjusted Langevin Algorithm 
(ULA), fULA has the same order of weak convergence one, if applied as a one-step integrator 
for ( 1 ). One could obtain a second order weak method by changing the constants in front of 
the higher order coefficients, but in fact the corresponding method would not have better scal¬ 
ing properties than MALA when used in the Metropolis-Hastings framework. This observation 
answers negatively in part one of the questions in the discussion of [ 8 ] about the potential use 
of higher order integrators for the Langevin equation within the Metropolis-Hastings frame¬ 
work. 


H^{x, h) =x + ^/(x) - ^ (L>/(x)/(x) {S : D‘^f{x)}) , 
8^{x,h) = (h^/^ld+{h^^^/'^^)Df{x)^ _ 
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Remark 2.3. The proposal given by equation (12) contains higher order derivatives of the 
vector field f{x), resulting in higher computational cost than the standard MALA proposal. 
This additional cost might offset the benefits of the improved scaling, since the corresponding 
■Jacobian and Hessian can be full matrices in general. However, there exist cases of interestf 
where due to the structure of the Jacobian and Hessian the computational cost of the fMALA 
proposal is of the same order with respect to the dimension d as for the MALA proposal. 
Furthermore, we note that one possible way to avoid derivatives is by using finite differences 
or Runge-Kutta type approximations of the proposal (12). This, however, is out of the scope 
of the present paper. 

2.3.2 Modified Ozaki-Metropolis algorithm (mOMA) 

One of the problems related to the MALA proposal is that it fails to be geometrically ergodic 
for a wide range of targets vr [24]. This issue was addressed in [23] where a modification of 
MALA based on the Ozaki discretization [18] of (1) was proposed and studied. In the same 
spirit as in [23] we propose here a modification of fMALA, defined by 


(13a) 

(13b) 


where 

h, a) = (aM)-i(e(“^/ 2 )M _ 

for all^ M G h > 0, a € M. 

The Markov chain defined by (13) will be referred to as the modified unadjusted Ozaki 
algorithm (mUOA), whereas when it is used in a Hastings-Metropolis algorithm, it will be 
referred to as the modified Ozaki Metropolis algorithm (mOMA). Note that 1 (e^* — l)/t — 
{l/3)h‘^t is positive on M for all h > 0. It then follows from Remark 2.1 that for all x G 
the matrix JJi{Df{x),2h, 1) — {hf /3)Df{x) is diagonalizable with non-negative eigenvalues, 
which permits to define its matrix square-root, and S™^{x,h) is well defined for all x G 
and h > 0. 

Remark 2.4. In regions where ||SVlog7r(a:)|| is much greater than ||x||, we need in practice 
to take h very small (of order ||x|| / ||SVlog7r(x)||J for MALA to exit these regions. However 
such a choice ofh depends on x and cannot be used directly. Such a value of h can therefore be 
hard to find theoretically as well as computationally. This issue can be tackled by multiplying 
f = SVlog7r(x) by JJi{Df{x),h,a) in (13a). Indeed under some mild conditions, in that 
case, we can obtain an algorithm with good mixing properties for all h > 0 ; see [23, Theorem 
4 . 1 ]. mOMA faces similar problems due to the term Df{x)f{x). 

2.3.3 Generalised Boosted Ozaki-Metropolis Algorithm (gbOMA) 

Having discussed the possible limitations of mOMA in Remark 2.4 we generalise here the 
approach in [23] to deal with the complexities arising to the presence of the Df{x)f(x) term. 

®We study one of those in Section 5. 

■^Notice that the matrix functionals in (14),(16),(17) remain valid if matrix aM is not invertible, using the 
appropriate power series for the matrix exponentials. 


h) =x + 5i(T>/(x), h, l)/(x) - [hf /H)Df{x)f{x) 

- (hV24){S : D^f{x)] 

S^^{x,h) = (lTi(Z)/(x), 2/1,1) - (/tV3)I?/(x))'/'sV2 . 
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In particular we now define 


h) = x + ^i{Df{x), h, ai)f{x) 


- {l/3)^s{Df{x),h,a3){^ : D^f{x)} 

+ ((ai/2) + (1/6)) ^2{Df{x), h, a2)/(x) , 


(15a) 


S^^^ix,h) = i^iiDfix),2h,a^) 

+ (( 04 / 2 ) - (1/6)) £r^{Df{x),2h, as))^/" . 


(15b) 


where ai, i = 1, • • • ,5 are positive parameters, 3'\ is given by (14) and 


,^2{M,h,a) = - 1 ^) 

^3{M,h,a) = (aM)-2(e(“'*/2)M _i^_(a/i/2)M) 


(16) 

(17) 


with M G /i > 0, o G M and 1^ is the identity matrix. The Markov chain defined by (15) 

will be referred to as the generalised boosted unadjusted Ozaki algorithm (gbUOA), whereas 
when it is used in a Hastings-Metropolis algorithm, it will be referred to as the generalised 
boosted Ozaki Metropolis algorithm (gbOMA). Note that in (15b) is not always well 


defined in general. However, using Remark 2.1, the following condition is sufficient to define 
with the square-root of a diagonalizable matrix with non-negative eigenvalues. 


Assumption 1. The function t — {a 4 t) + {a 4 / 2 —(1/6)) {e — l)/(a 5 t) is positive 

on M. 

For 04 = 05 = 1, this assumption is satisfied, and choosing Oj = 1 for alH = 1,..., 5, (15) 
leads to a well defined proposal, which will be referred to as the boosted Unadjusted Ozaki 
Algorithm (bUOA), whereas when it is used in a Hastings-Metropolis algorithm, it will be 
referred to as the boosted Ozaki Metropolis Algorithm (bOMA). We will see in Section 4 
that bOMA has nicer ergodic properties than fMALA. 

3 Main scaling results 

In this section, we present the optimal scaling results for fMALA and gbOMA introduced in 
Section 2. We recall from the discussion in Section 2 that the parameter h depends on the 
dimension and is given as with £ > 0. Finally, we prove our results for the 

case of target distributions of the product form given by ( 8 ), we take S = U, and make the 
following assumptions on g. 

Assumption 2. We assume 

1 . ge Ci°(M) and g" is bounded on M. 

2. The derivatives of g up to order 10 have at most a polynomial growth, i.e. there exists 
constants C, k such that 


|< 7 ^*)(t)| <C(l + |tr), tGM,z = l,..., 10 . 


3. for all A: G N, 
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3.1 Optimal scaling of fMALA 

The Markov chain produced by fMALA, with target density vr^ and started at stationarity, 
will be denoted by , k € N}. Let be the transition density associated with the 

proposal of fMALA relatively to tt^. In a similar manner, we denote by the acceptance 
probability. Now we introduce the jump process based on k E N}, which allows us 

to compare this Markov chain to a continuous-time process. Let {Jt, t E M+j be a Poisson 
process with rate and let = {p^’ ^ t E M+} be the d-dimensional jump process 

defined by p^’^ = Xj’^. We denote by 

aTi^)= [ [ '^d{x)qTix,y)a^{x,y)dxdy 

jRd jRd, 

the mean under of the acceptance rate. 

Theorem 3.1. Assume Assumption 2. Then 

hm aTii) = a^{i), 

< 1 —>-+oo 

where a^{i) = 21 >(—with <I>(t) = (l/(27r)) f^^e~^^^‘^ds and the expression of 
is given in Appendix D. 

Theorem 3.2. Assume Assumption 2. Let = T^’j^, t E M+} he the process corre¬ 
sponding to the first component o/P'^’^. Then, d E N*} converges weakly (in the 

Skorokhod topology), as d ^ oo, to the solution {Y^, t E M+} of the Langevin equation 
defined by: 

dY^^ = + (l/2)hf^(f)Vlog7ri(yif^)dt , (18) 

where h^(f) = 2f^<h(—X^f^/2) is the speed of the limiting diffusion. Furthermore, h^(t') 
is maximised at the unique value of i for which aP^{t) = 0.704343. 

Proof. The proof of these two theorems are in Appendix A. □ 

Remark 3.3. The above analysis shows that for fMALA, the optimal exponent defined in 
( 6 ) is given by 70 = 1/5 as discussed in Section 2.2. Indeed, if hd has the form 
then an adaptation of the proof of Theorem 3.1 implies that for all £ > 0, if e & (0,1/5), 
limd^_i_oo a^(f) = 0. In contrast, if e < 0 then lim^^+oo a^(f) = 1. 

3.2 Scaling results for gbOMA 

As in the case of fMALA, we assume TTd is of the form (8) and we take S = 1,^, hd = £‘^d~^^^. 
The Metropolis-adjusted Markov chain based on gbOMA, with target density vr^ and started 
at stationarity, is denoted by fe E N}. We will denote by q^^ the transition density 

associated with the proposals defined by gbOMA with respect to iTd- In a similar manner, the 
acceptance probability relatively to vr^ and gbOMA will be denoted by Let {Jt, t E K+} 

be a Poisson process with rate d^^®, and let = {P^’®'^®, t E K_|_} be the d-dimensional 

jump process defined by Denote also by 

= / [ TTd{x)qf'^{x,y)af'^{x,y)dxdy 

the mean under of the acceptance rate of the algorithm. 
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Theorem 3.4. Assume Assumptions 1 and 2. Then 


lim , 

d —^~hoo 


where ag'’°(£) = with 4>(t) = (l/(27t ))and are given in 

Appendix D. 

Theorem 3.5. Assume Assumptions 1 and 2. Let t G M+} be the process 

corresponding to the first component of . Then, {, d G N* } converges weakly (in 

the Skorokhod topology) to the solution t G M+} of the Langevin equation defined by: 

dGf° = ^ (l/2)hg'"°(^)Vlog7rc(Gf°)dt , 

where hg'"0(£) = /2) is the speed of the limiting diffusion. Furthermore, 

]^gbO(£) maximised at the unique value of i for which = 0.704343. 

Proof. Note that under Assumption 2-1, at fixed a > 0, using the regularity properties of 
{x, h) e->■ £^i{x, h, a) on for i = 1,..., 3, there exists an open interval I, which contains 0, 
and Mq > 0 such that for all x G M, A: = 1, • • • , 11, and i = 1, • • • ,3 


{3Li{g"{x),h,a)) 

dh^ 


<Mo V/iGl. 


Using in addition Assumption 1 there exists mo > 0 such that for all /i G / and for all x G M, 


FLi{g''{x),2h,ai) + (( 04 / 2 ) - ( 1 / 6 )) 32{g''{x),2h,a^) > mo . 


Using these two results, the proof of both theorems follows the same lines as Theorems 3.1 
and 3.2, which can be found in Appendix A. □ 


4 Geometric ergodicity results for high order Langevin schemes 

Having established the scaling behaviour of the different proposals in the previous section, we 
now proceed with establishing geometric ergodicity results for our new Metropolis algorithms. 
Furthermore, for completeness, we study the behaviour of the corresponding unadjusted 
proposal. For simplicity, we will take in the following S = /^ and we limit our study of 
gbOMA to the one of bOMA, which is given by: 

/r^°(x, h) =x + ^i{Df{x), h, l)/(x) + (2/3)52(Zl/(x), h, l)/(x) 
-{l/2,)£7^{Df{x),h,l){i:-.D^f{x)], 

5'’0(x, h) = mo fix), 2h, 1) + (1/3)^2(7?/(x), 2h, 1))^/" , 

where 5j, ^2 and ^3 are respectively dehned by (14), (16) and (17). First, let us begin with 
some definitions. For a signed measure u on we define the total variation norm of u by 

IIz^IItv = sup \iy{A)\ , 
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where B{W^) is the Borel fi-algebra of Let P be a Markov kernel with invariant measure 
TT. For a given measurable function V : [1, +oo), we will say that P is ^-geometrically 

ergodic if there exist C > 0 and p € [0,1) such that for all x G and n > 0 

\\P^{x,-)-7r\\y<Cp^Vix) , 

where for u a signed measure on the l/-norm H-Hy is defined by 

f{x)u{dx) . 

We refer the reader to [17] for the definitions of small sets, (/?-irreducibility and transience. 
Let P be a Markov kernel on R'^, Leb'^-irreducible, where Leb'^ is the Lebesgue measure on 
R*^, and aperiodic and 1/ : R'^ ^ [1, +oo) be a measurable function. In order to establish that 
P is ^-geometric ergodicity, a sufficient and necessary condition is given by a geometrical 
drift (see [17, Theorem 15.0.1]), namely for some small set C, there exist A < 1 and b < +oo 
such that for all x G R*^: 

PV{x) < XV{x) + blc{x) . (20) 

Note that the different considered proposals belong to the class of Gaussian Markov 
kernels. Namely, let Q be a Markov kernel on R*^. We say that Q is a Gaussian Markov 
kernel if for all x G R'^, Q{x, •) is a Gaussian measure, with mean p{x) and covariance matrix 
S{x)S'^{x), where x p{x) and x S{x) are measurable functions from R'^ to respectively 
R'^ and 5y(R'^), the set of symmetric positive definite matrices of dimension d. These two 
functions will be referred to as the mean value map and the the variance map respectively. 
The Markov kernel Q has transition density q given by: 

Qix,y) = - p{x)),{y - p{x)))) , (21) 

where for M G R'^^'^, |M| denotes the determinant of M. Geometric ergodicity of Markov 
Ghains with Gaussian Markov kernels and the corresponding Metropolis-Hastings algorithms 
was the subject of study of [24, 9]. But contrary to [9], we assume for simplicity the following 
assumption on the functions /i : R'^ —)• R'^ and 5 : R'^ —>■ 51((R'^): 

Assumption 3. The functions x i-t- p{x) and x S{x) are continuous. 

Note that if vr, a target probability measure on R'^, is absolutely continuous with respect 
to the Lebesgue measure with density still denoted by vr, the following assumption ensures 
that the various different proposals introduced in this paper satisfy Assumption 3: 

Assumption 4. The log-density g of tt belongs to C'^(R‘^). 

We proceed in Section 4.1 with presenting and extending where necessary the main results 
about geometric ergodicity of Metropolis-Hasting algorithms using Gaussian proposals. In 
Section 4.2, we then introduce two different potential classes on which we apply our result 
in Section 4.3. Finally in Section 4.4, for completeness, we make the same kind of study but 
for unadjusted Gaussian Markov kernels on R. 


wW = 


sup 

{/ ; \f\<v}. 
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4.1 Geometric ergodicity of Hastings-Metropolis algorithm based on Gaus¬ 
sian Markov kernel 

We first present an extension of the result given in in [9] for geometric ergodicity of Metropolis- 
Hastings algorithms based on Gaussian proposal kernels. In particular, let Q be a Gaussian 
Markov kernel with mean value map and variance map satisfying Assumption 3. We use such 
proposal in a Metropolis algorithm with target density vr satisfying Assumption 4. Then, the 
produced Markov kernel P is given by 

P{x,dy) = a{x,y)q{x,y)<ly+ 5x{dy) I {I - a{x,y))q{x,y)<ly , (22) 

where q and a are resp. given by (21) and (2). 

Assumption 5. We assume liminf||3,||^_|_oo a(x, y)g(x, ?/)dy > 0. 

Note that this condition is necessary to obtain the geometric ergodicity of a Metropolis- 
Hastings algorithm by [25, Theorem 5.1]. We shall follow a well-known technique in MGMG 
theory in demonstrating that Assumption 5 allows us to ensure that geometric ergodicity 
of the algorithm is inherited from that of the proposal Markov chain itself. Thus, in the 
following lemma we combine the conditions given by [9], which imply geometric ergodicity 
of Gaussian Markov kernels, with Assumption 5 to get geometric ergodicity of the resultant 
Metropolis-Hastings Markov kernels. 

Lemma 4.1. Assume Assumptions 3, 5, and there exists r G (0,1) such that 

limsup ||/i(x)|| / ||x|| = r, and limsup ||S'(x)|| / ||x|| = 0 . (23) 

||ir||—>-+oo ||x||—>-+oo 

Then, the Markov kernel P given by (22) are V-geometrically ergodie, where V{x) = l-|-||x||^. 
Proof. The proof is postponed to Appendix B.l. □ 

We now provide some conditions which imply that P is not geometrically ergodie. 
Theorem 4.2. Assume Assumptions 3,4, that vr is bounded and there exists e > 0 such that 
liminf ||5(x)“^/x(x)|| ||x||“^ > , liminf inf ||5(x)y|| > e , (24) 

||ai||—>-+00 ||a:||—>-+oo ||y|| = l 

and 

lim log (|5(x)|) / ||x||^ = 0 . (25) 

||ai||^+cxD 

Then, P is not geometrically ergodie. 

Proof. The proof is postponed to Appendix B.2. □ 
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4.2 Exponential potentials 

We illustrate our results on the following classes of density. 


4.2.1 The one-dimensional class <f(/3,7) 

Let TT be a probability density on M with respect to the Lebesgue measure. We will say that 
vr G £1(/3,7) if vr is positive, belongs to C'^(M) and there exist > 0 such that for all 

X G M, \x\ > Rjr, 

7r(x) oc . 

Then for \x\ > Rt^, log(7r(x))' = —^fix log(7r(x))" = —7/3(/3—1) \x\^ /x^ and log(7r(x))® 

— l)(/3 — 2) jxp /x^. 


4.2.2 The multidimensional exponential class 

Let TT be a probability density on with respect to the Lebesgue measure. We will say that 
vr G if it is positive, belongs to C'^(]R'^) and there exists R-^ > 0 such that for all x G 
|| a ^|| > R-tti 

7r(x) oc , 

where q is a function of the following form. There exists a homogeneous polynomial p of 
degree m and a three-times continuously differentiable function r on satisfying 

||^'(Vr)(x)|| = odlxir-^"), (26) 

||a:||^+cxD 


and for all x G 


q(x) = p(x) + r(x) . 


Recall that p is an homogeneous polynomial of degree m if for all t G M and x G 
p{tx) = t™p(x). Finally we define the set of density vr G such that the Hessian of 
p at X, V^p(x) is positive definite for all x 7 ^ 0. 

When p is an homogeneous polynomial of degree m, it can be written as 


P(®) = X] , 

|k|=m 

where k G |k| = fc* and x^ = x\^ ■ ■ ■ x^f. Then denoting by n^; = x/ ||x||, it is easy to 
see that the following relations holds for all x G 


p(x) = 

Ikll”" p(na;) 

(27) 

Vp(x) = 

||x||”'“^ Vp(na;) 

(28) 

V^p(x) = 

||x||™“^ V^p(n2,) 

(29) 

D\Vp){x) = 

||x|r“^T»2(Vp)(x) 

(30) 

(Vp(x),x) = 

m p(x) 

(31) 

V^p(x)x = 

{m — 1) Vp(x) 

(32) 

V^p(x)x,x) = 

m{m — 1) p(x) . 

(33) 


From (29), it follows that V^p(x) is definite positive for all x G \ 0 if and only if V^p(n) 

is positive definite for all n, with ||n|| = 1. Then, p belongs to only if m > 2. 
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4.3 Geometric ergodicity of the proposals: the case of Metropolis-Hastings 
algorithms 

In this section we study the behaviour of our proposals within the Metropolis-Hastings frame¬ 
work. We will split our investigations in two parts: in the first we study fMALA and mOMA; 
while in the second we have a more detailed look in the properties of bOMA not only for the 
class ff(/3,7), but also for the polynomial class 

4.3.1 Geometric ergodicity of fMALA, mOMA for the class <f(/3,7) 

In the case jS G (0, 2), fMALA and mOMA have their mean map behaving like x—f^jx \x\~ /2 
at infinity and their variance map bounded from above. This is exactly the behaviour that 
MALA [24] has for the same values of /3, thus one would expect them to behave in the same 
way. This is indeed the case and thus using the same reasoning as in the proof [24, Theorem 
4.3] we deduce that the two algorithms are not geometrically ergodic for (5 G (0,1). Similarly, 
the proof in [24, Theorem 4.1] can be used to show that the two algorithms are geometrically 
ergodic for /3 G [1,2). Furthermore, for values of /3 > 2 we have the following cases 

(a) For ,0 = 2, 

- fMALA is geometrically ergodic if h'y{l + h'y/Q) G (0, 2) by [24, Theorem 4.1], and not 

geometrically ergodic if h'y{\ + /Q) > 2 by Theorem 4.2, since is equivalent at 

infinity to (1 — h'y{l + hj/6))x and S'^(x) is constant for jxj > Rt^. 

- Since /r™® is equivalent at infinity to — 2{h'yp/3)x, we observe that mOMA is 

geometrically ergodic if hj G (0,1.22) by [24, Theorem 4.1], and not geometrically 
ergodic if h'y > 1.23 by Theorem [25, Theorem 5.1]. 

(b) For /3 > 2, fMALA and mOMA are not geometrically ergodic by Theorem 4.2 since the 
mean value maps of their proposal kernels are equivalent at infinity to —Ci 

their variance map to C 2 \x\ for some constants Ci^C 2 > 0, and the variance maps 
are bounded from below. 

4.3.2 Geometric ergodicity of bOMA 

In this section, we give some conditions under which bOMA is geometrically ergodic and 
some examples of density which satisfy such conditions. For a matrix M G we denote 

Amin (M) = minSp(M) and Amax (M) = maxSp(M), where Sp(M) is the spectrum of M. We 
can observe three different behaviours of the proposal given by (19) when x is large, which 
are implied by the behaviour of Amin {Df{x))) and Amax {Df{x)). 

If liminf|| 3 ,||_j,_|_oo Amin (-D/(x)) = 0. Then, g{x) = o(jjxjj^) as jjxjj ^ 00 , and y*’® tends to 
be as the MALA proposal at infinity, and we can show that bOMA is geometrically ergodic 
with the same conditions introduced in [24] for this one. 

Example 4.3. By [24, Theorem 4-1] bOMA is geometrically ergodic for vr G (^’( 7 ,/?) with 
/3g [1 ,2). 

Now, we focus on the case where limsup|| 3 ,||^_,_oo Amax (A>/(a:)) < 0. For instance, this 
condition holds for vr G when /3 > 2. We give conditions similar to the one for 

geometric convergence of the Ozaki discretization, given in [9], to check conditions of Lemma 
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4.1. Although these conditions does not cover all the cases, they seem to apply to interesting 
ones. Here are our assumptions where we denote by = {x G ||x|| = 1}, the sphere in 
and Hx = xj ||x||. 

Assumption 6. We assume: 

1. limsup||,^||^+^ 

-^max {Df{x)) < 0; 

2. lim||3,||^+ooZl/(x)-2{i^ : D‘^f{x)] = 0; 

3. Df{x)~^f{x) is asymptotically homogeneous to x when ||x|| —>■ +oo, i.e. there exists a 

function c : —>■ M such that 

lim 

||a;||—>-+oo 

The condition 1 in Assumption 6 implies that for all x G Xma,x {D f (x)) < Mf, and 

garantees that S^^{x, h) is bounded for all x G K'^. 

Lemma 4.4. Assume Assumptions 4 cind 6. There exists My, > 0 such that for all x G 
||5^°(x,/i)|| < My. 

Proof. Since S^^{x, h) is symmetric for all x G and 1— l)/t + (l/3)(e“^^*)^ — l)/t 
is bounded on (—oo,M] for all M G M, we just need to show that there exists Mf > 0 such 
that for all x, Amax {Df{x)) < Mf. First, by Assumption 6-(l), there exists i? > 0, such that 
for all X, ||x|| > R, Sp{Df{x)) C M_. In addition by Assumption 4 x e-)- Df{x) is continuous, 
and there exists M > 0 such that for all x, ||x|| < i?, ||Zl/(x)|| < M. □ 


Df{x) ^f{x 


- c[nx n. 


= 0 


Theorem 4.5. Assume Assumptions 4, 5 and 6. If 


0 < inf c(n) < sup c(n) < 6/5 , 


(34) 


then bOMA is geometrically ergodic. 


Proof. We check that the conditions of Lemma 4.1 hold. By Assumption 4 and (19), As¬ 
sumption 3 holds, thus it remains to check (23). First, Lemma 4.4 implies that the second 
equality of (23) is satisfied, and we just need to prove the first equality. By [9, Lemma 3.4], 
it suffices to prove that 

li„,,,jp/^,^ + 2sA<0, (35) 

|| a :||—>-+00 \ ll^ll ll^ll / 

where rj{x) = pf^{x,h) — x. Since lim sup||3,||^_,_oo Amax (.^/(^c)) < 0 we can write f^(x) = 
3S{x)Df{x)~^f{x), where 


^{x) = {e^h/2)Df{x) _ ^ ( 2 / 3 )^ 

and X !->■ AS{x) is bounded on M'^. Since 3^ is bounded on M'^, by Assumption 6-(2)-(3) and 
(34), 


lim 

||ai||—>-+00 


rj{x) r]{x) ^ \ _ (.(fixf + 2 (^(x)na;, fix) c{nx) 




X 


= 0 . (36) 
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In addition, if we denote the eigenvalues of by {Ai(x), i = 1,... ,(i} and {ei(x), i = 

1,... ,d} an orthonormal basis of eigenvectors, we have 

\\^{x)nx\f c{nxf + 2 Hx) c{nx) 

d 

= ^ c{nx)\i{x) {ei{x),nx)‘^ {c{nx)Xi{x) + 2) (37) 

i=l 

Since limsup|| 3 ,||_,.+oo ZI/(a:) < 0, for all i and ||x|| large enough, Xi{x) G [—5/3,0). Therefore 
using (34) we get from (37): 

\\^{x)nx\f c{nx)‘^ + 2 {^{x)nx, fix) c{nx) < 0 . 

The proof is concluded using this result in (36). □ 


Application to the convergence of bOMA for vr G 

For the proof of the main result of this section, we need the following lemma. 

Lemma 4.6 ([9, Proof of Theorem 4.10]). Let tt G for m >2, then vr satisfies Assump¬ 
tion 6-{3) with c{n) = l/(m — 1) G (0,6/5) for all n G 

Proposition 4.7. Let tt G for m > 2, then bOMA is V-geometrically ergodic, with 
V(x) = ||x||^ + 1. 

Proof. Let us denote vr oc exp(—p(x) — r(x)), with p and r satisfying the conditions from the 
definition in Section 4.2.2. We prove that if tt G Theorem 4.5 can be applied. First, 
by definition of Assumption 4 is satisfied. Furthermore, Assumption 6-(l)-(2) follows 
from (26), (29), (30) and the condition that V^p(n) is positive definite for all n G S'^. Also 
by Lemma 4.6, Assumption 6-(3) is satisfied. 

Now we focus on Assumption 5. For ease of notation, in the following we denote and 
by pL and S , and do not mention the dependence in the parameter /i of /r and S when 
it does not play any role. Note that 

[ a{x,y)q{x,y)dy = {2n)~'^/‘^ [ {1 A exp 5(x, 0} exp(-||^f/2)dC , (38) 

JKd. J^d 

where 

a{x, 0 = -p{h{x) + S{x)f,) + p(x) - r(/r(x) 

+ S{x)C) + t{x) - log(|S’(/r(x) + 5’(x)0l) + log(|5'(x)|) + (1/2) ||^f 
- {1/2) (^(S{x,i))~^{x - p,{n{x) + S{x)i)},x - p.{p,{x) + S{x)f,)^ , (39) 

and S{x,^) = S{pL{x) + 5'(x)^)5(/r(x) + S{x)f,)'^. First, we consider m > 3, then we have the 
following estimate of the terms in (39) by (26)-(30) and Lemma 4.6: 

{1-5/(3(m-l))}u; + o(||u;||) (40) 

||ui||—>-+00 

{S{w)S{w)'^)~^ = jm(m — 1) V^p(n^) + o(||r(;||™'“^) 

||tt;||—)-+oo 4 

log(|5(u;)|) = o(||rc||) 

||ui||—>- + 00 
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(41) 

(42) 


Then by (40)-(42), if we define 4' : [3, +oo) —)• M by 

r s'!™ 

m !-)• 1 — < 1 -;-r > 

I 3(m- 1)/ 

we get 

a{x,C) = ||x|rp(n^)^(m) + o(||x|r) . 

||x||—>+00 

Since T is positive on [3, + 00 ), for all ^ G lim|| 3 ,||_j,+oo a{x,^) = + 00 . This result, (38) and 
Fatou’s Lemma imply that Assumption 5 is satisfied. 

For m = 2, we can assume p(x) = (Ax, x) with A € Let us denote for M an invertible 

matrix of dimension p > 1, 

e(M) = (e-^-Ip) + (2/3)(e-^^-Ip) 

?(M) = (e-2^^ - Ip) + (1/3) - Ip) . 

Then we have the following estimates: 

5(x, ^) = (A(?(/iA))“^ { {2g{hA) + g{hA)^) x} , {2g{hA) + g{hA)^) x) 

||a:||—>+00 

+ (Ax,x) - (A{(Irf+£)(/iA))x} , {ld+g{hA))x) + o(||x||^) (43) 

If we denote the eigenvalues of A by {Aj, z = 1... d} and {xj, i = 1 ,..., d} the coordinates of 
X in an orthonormal basis of eigenvectors for A, (43) becomes 

d 

x H -+00 , 

2=1 

where for d, A > 0, 

.=.(/j, A) = A (1 — {g{hX) + 1)^ + ?(/iA) ^ (4^5(/iA)^ + Ag^hX)^ + ^(/zA)^)) . 

Using that for any h, X > 0, H(/i, A) > 0 and (44), we have for all ^ G lim|| 3 ,||_^+oo 5(x, ^) = 
+ 00 , and as in the first case Assumption 5 is satisfied. □ 

Remark 4.8. Using the same reasoning as in Proposition 4-7, one can show that bOMA is 
geometrically ergodic for vr G with (3 >2. 

We now summarise the behaviour for all the different algorithms for the one dimensional 
class S’{j3,j) in Table 1 

4.4 Convergence of Gaussian Markov kernel on R 

We now present precise results for the ergodicity of the unadjusted proposals, by extending 
the results of [24] for the ULA to Gaussian Markov kernels on M. Under Assumption 3, it 
is straightforward to see that Q is Leb'^-irreducible, where Leb'^ is the Lebesgue measure, 
aperiodic and all compact set of are small; see [9, Theorem 3.1]. We now state our main 
theorems, which essentially complete [24, Theorem 3.1-3.2]. Since their proof are very similar, 
they are omitted. 
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Method 

/3g [ 1 , 2 ) 

/3 = 2 

II >2 

fMALA (12) 

geometrically 

ergodic 

geometrically er¬ 
godic or not 

not geometri¬ 
cally ergodic 

mOMA (13) 

geometrically 

ergodic 

geometrically er¬ 
godic or not 

not geometri¬ 
cally ergodic 

bOMA (19) 

geometrically 

ergodic 

geometrically er¬ 
godic 

geometrically 

ergodic 


Table 1: Summary of ergodicity results for the Metropolis-Hastings algorithms for the class 


Theorem 4.9. Assume Assumption 3, and there exist sa,u+,u_ G and x G K such that: 

limsup S'(x) < Sa , 

|a;|—)'+oo 

lim — x} x~^ = —, and lim {/i(x) — x} |x|~^ = U- . 

X^ + OO X^ — OO 

(1) If X ^ [0) 1); then Q is geometrically ergodic. 

(2) If x = ^ CL'^d (1 — u+)(l — u_) < 1, then Q is geometrically ergodic. 

(3) If X ^ then Q is ergodic but not geometrically ergodic. 

Proof. See the proof of [24, Theorem 3.1], □ 

Theorem 4.10. Assume Assumption 3, and there exist sv,u+,u_ G and y G M such 
that: 

lim inf S{x) > Sy , 

|a:|—>+00 

lim 5(x)“^^(x)x“^ = —, and lim S{x)~^fi{x) |x|~^ = n_ . 

x^+oo x^—oo 

(1) If x> then Q is transient. 

(2) If X = ^ o,nd (n+ A u_)sv > 1, then Q is transient. 

Proof. See the proof of [24, Theorem 3.2], □ 


Ergodicity of the unadjusted proposals for the class S{l3,x) 

We now apply Theorems 4.9 and 4.10 in order to study the ergodicity of the different un¬ 
adjusted proposals applied to vr G <f(/3,7). In the case /? G (0,2) all the three algorithms 
(fULA,mUOA,bUOA) have their mean map behaving like x — jS^x \x\~ /2 at inhnity and 
their variance map bounded from above. This is exactly the behaviour that ULA [24] has 
for the same values of /?, thus it should not be a surprise that Theorem 4.9 implies that all 
the three algorithms behaved as the ULA does for the corresponding values, namely being 
ergodic for /3 G (0,1) and geometrically ergodic for fi G [1,2). Furthermore, for values of 
/3 > 2 we have the following cases. 
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(a) For (3 = 2, 

- fULA is geometrically ergodic if h'y{\ + h'y/6) G (0,2) by Theorem 4.9-(2), and is 
transcient if h^{l + / 17 / 6 ) > 2 by Theorem 4.10-(2), since is equivalent at inhnity 
to (1 — /i 7 (l + h'y/6))x and 5^(rc) is constant for |x| > Rt^- 

- mlJOA is geometrically ergodic if 1 + 2{hj)'^/3 — G (0, 2) by Theorem 4.9-(2), and 

is transcient if 1 + 2 (/i 7)^/3 — e~'^^ > 2 by Theorem 4.10-(2), since /r™® is equivalent 
at infinity to — 2{h'y)‘^/3)x and is constant for |x| > Rt^. 

- bUOA is geometrically ergodic by Theorem 4.9-(2), since /r*’® is equivalent at infinity 
to —2x/3 and S^^{x) is constant for |x| > Rt^- 

(b) For /3>2, 

- fULA and mUOA are transcient by Theorem 4.10-(1) since their mean value map is 

equivalent at infinity to —Ci /x, and their variance map to C 2 for some 

constants Ci,C 2 > 0 , and their variance map are bounded from below. 

- bUOA is geometrically ergodic by Theorem 4.9-(l) since its mean value map is equiv¬ 
alent at inhnity to {1 — 5/(3(/3 — 1))} x and its variance map is bounded from above. 

The summary of our hndings can be found in Table 2. 


Method 

(3 G (0,1) 

/3g [1 ,2) 

/3 = 2 

/3 > 2 

fULA (12) 

ergodic 

geometrically 

ergodic 

geometrically 

er- 

godic/transient 

transient 

mUOA (13) 

ergodic 

geometrically 

ergodic 

geometrically 

er- 

godic/transient 

transient 

bUOA (19) 

ergodic 

geometrically 

ergodic 

geometrically 

ergodic 

geometrically 

ergodic 


Table 2: Summary of ergodicity results for the unadjusted proposals for the class S’{f3,'y). 


5 Numerical illustration of the improved efficiency 

In this section, we illustrate our analysis (Section 3.1) of the asymptotic behaviour of fMALA 
as the dimension d tends to inhnity, and we demonstrate its gain of efficiency as d increases 
compared to the standard MALA. Following [21], we dehne the hrst-order efficiency of a 
multidimensional Markov chain {X^, k G N} with hrst component denoted as E[(A^^^ — 

In Figure 1, we consider as a test problem the product case ( 8 ) using the double 
well potential with g{x) = —\x'^ + in dimensions d = 10,100, 500,1000, respectively. We 
consider many time stepsizes h = !l^dr^l'°^ plotting the hrst order efficiency (multiplied by 
d}-!^ because this is the scale which is asymptotically constant for fMALA as d —)• 00 ) as a 
function of the acceptance rate for the standard MALA (white bullets) and the acceptance 
rate of the improved version fMALA (black bullets), respectively. For simplicity, each 
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acceptance rate acceptance rate 

Figure 1: First-order efficiency of the new fMALA and the standard MALA for the donble 
well potential g{x) = — 5 *^ + as a function of the overall acceptance rates in dimensions 
d = 10,100,500,1000. The solid line is the reference asymptotic curve of efficiency for the 
new fMALA, normalised to have the same maximum value as the finite dimensional fMALA. 


chain is started from the origin. The expectations are approximated as the average over 
2 X 10^ iterations of the algorithms and we use the same sets of generated random numbers 
for both methods. For comparison, we also include (as solid lines) the asymptotic efficiency 
curve of fMALA as d goes to infinity, normalised to have the same maximum as fMALA 
in finite dimension d. This corresponds to the (rescaled) limiting diffusion speed h^(.^) as 
a function of a^{i) (quantities given respectively in Theorems 3.1 and 3.2). We observe 
excellent agreement of the numerical first order efficiency compared to the asymptotic one, 
especially as d increases, which corroborates the scaling results of fMALA. In addition, we 
observe for the considered dimensions d that the optimal acceptance rate maximizing the 
first-order efficiency remains very close to the limiting value of 0.704 predicted in Theorem 
3.2. This numerical experiment shows that the efficiency improvement of fMALA compared 
to MALA is significant and indeed increases as the dimension d increases, which confirms the 
analysis of Section 3.1. 

For our next experiments, we consider the d-dimensional zero-mean Gaussian distribution 
with covariance matrix 1^ for d = 1000, as target distribution. We aim to numerically study 
the transient behaviour of fMALA and propose some solutions to overcome this issue. In 
Figure 2, we plot the squared norm of 10^ samples generated by the RWM, MALA, fMALA 
and some hybrid strategies for MALA and fMALA, all started from the origin. We also 
include a zoom on the first 100 steps. In Figure 2a, we use standard implementations of the 
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RWM ; 


MALA fMALA 
_1_ 


50 100 

steps 



(a) standard schemes. 




steps 

(b) hybrid methods 
using RWM (/i- rf-i). 



steps 

(c) hybrid methods 
using MALA (h ^ 


Figure 2: Trace plots of ||X|p for the Gaussian target density in dimension d = 1000 when 
starting at the origin. Comparison of fMALA with h d (solid lines), MALA with 
h ~ d~^/^ (dashed lines), RWM with h ~ d~^ (dotted lines). 
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Figure 3; Auto-correlation versus LAG for the Gaussian target density in dimension d = 1000. 
Comparison of fMALA with h ~ d~^/^ (black), MALA with h ~ d~^/^ (white), RWM with 
h ~ d~^ (gray). 


schemes. The time step h for each algorithm is chosen as the optimal parameter based on the 
optimal scaling results of all the algorithms at stationarity: for the RWM h = 2.38^(i“^, for 
MALA h = 1.65^(i“^/^ and for fMALA h = It can observed that MALA exhibits 

many rejected steps in contrast to RWM. This is a known issue of MALA in the transient 
phase [4, 12] due to a tiny acceptance probability at first steps, and the same behaviour can 
be observed for fMALA, with zero accepted step in the present simulation. To circumvent 
this issue, the following hybrid MALA scheme was presented in [4], The idea is to combine 
MALA with RWM at each step: with probability 1/2, we apply the MALA proposal (4) with 
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step size h = the optimal parameter for MALA at stationarity. Otherwise, the 

RWM proposal (3) is used with step size h = 2.38^d~^, the optimal parameter for the RWM 
at stationarity. Indeed, [4] and [12] have shown that the optimal scaling in the transient 
phase and at stationarity is the same and scales as d~^. In Figure 2b, the plots for this 
hybrid MALA are presented, the same methodology is also applied for the hybrid fMALA 
scheme, showing a behaviour similar to hybrid MALA. In Figure 2c, the RWM proposal is 
replaced by the MALA proposal (4) with a different step size h = which is the optimal 

parameter for MALA in the transient phase according to [4]. Again, hybrid fMALA exhibits 
a behaviour similar to hybrid MALA. 

In Figure 3, we consider again the same schemes and hybrid versions as in Figure 2, 
with the same step sizes, and we compare their autocorrelation function. We consider for 
each algorithms 2 • 10^ iterations started at stationarity, where the first 10^ iterations were 
discarded as burn-in. In Figure 3a, it can be observed that the autocorrelation associated 
with fMALA goes to 0 quicker than the RWM and MALA. In Figure 3b, and Figure 3c, we 
observe that by using hybrid strategies which are designed to robustify convergence from the 
transient phase, fMALA still comfortably outperforms MALA in terms of expected square 
efficiency (which is a stationary quantity). 

Although our analysis applies only to product measure densities of the form (8), we next 
consider the following non-product density in M'^, defined using a normalization constant 
and for Xq = 0 as 


d 

7r{Xi,...,Xd) = Zdll 

i=l 


1 

l + (W-a(W_i))2’ 


(45) 


where we consider the scalar functions a{x) = x/2 and a{x) = sin(x), respectively. Notice 
that the density (45) is associated with the AR(1) process Aj = a(Aj_i) -|- with non 
Gaussian (Cauchy) increments Zn- Furthermore, we observe that in this case the Jacobian 
in (12) is a symmetric tridiagonal matrix, which implies that the computational cost of the 
fMALA proposal is of the same order 0(d) as the standard MALA proposal. 

In Figure 4, we compare for many timesteps the standard MALA (left pictures) and the 
new fMALA (right pictures), and plot the (scaled) first order efficiency E[||Afc_|_i — A^lp/d] 
as a function of the overall acceptance rates, using the averages over 2 x 10^ iterations of 
the algorithms. The initial condition for both algorithms is the same and is obtained after 
running 10^ steps of the RWM algorithm to get close to the target probability measure. 
Analogously to the product case studied in Figure 1, we observe in both cases a(x) = x/2 
and a{x) = sin(x) that the first-order efficiency of fMALA converges to a non-zero limiting 
curve with maximum close to the value 0.704. In contrast, the efficiency of the standard 
MALA drops to zero in this scaling where the first-order efficiency is multiplied with 
This numerical experiment suggests that our analysis in the product measure setting persists 
in the non product measure case. 
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A Proof of Theorems 3.1 and 3.2 


We provide here the proofs of Theorems 3.1 and 3.2 for the analysis of the optimal scaling 
properties of fMALA. We use tools analogous to that of [20] and [21]. Consider the generator 
of the jump process E'^’^, defined for G and x G by 




where y follows the distribution defined by g^(x,-). Also, consider the generator of the 
process {Gt,t > 0}, solution of (18), defined for ip G G^M), and x G by 

A^ip{x) = {'h{£)/2){i}'{xi)g{xi) + V'"(xi)) . 


We check that the assumptions of [6, Corollary 8.7, Chapter 4] are satisfied, which will 
imply Theorem 3.2. These assumptions consist in showing there exists a sequence of set 
{Fd C M'^, d G N*} such that for all T > 0: 


lim P 

d—>-+oo 


G Fd, Vs G [ 0 ,r] 


= 1 


lim sup 

d^+oo x&Fd 


AfV’(x) 


A^V'(x) 


= 0 , 


for all functions 'i/i in a core of A^, which strongly separates points. Since A^ is an operator 
on the set of functions only depending on the first component, we restrict our study on this 
class of functions, which belong to C'^(M), since by [6, Theorem 2.1, Chapter 8], this set of 
functions is a core for A^ which strongly separates points. The following lemma is the proper 
result which was introduced in Section 2.2. For the sequel, let {Ci,i G N*} be a sequence 
of i.i.d. standard one-dimensional Gaussian random variables and X be a random variable 
distributed according to vri. Also, for all x G denote by y^ the proposal of fMALA, 
defined by (9), (12a) and (12b), started at x G with parameter hd and associated with 
the d-dimensional Gaussian random variable {.^i, i = 1, - ■ ■ ,d}. 

Lemma A.l. Assume Assumption 2. The following Taylor expansion in h^ holds: for all 
X G and i G {1, • • • , d}, 


V TT{xi)q^{xi,yf^) ) 


10 

= '^Cf\xi,ii)d~ 

j=5 


+ Cff{xi,Ci,hd) , 


(46) 
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where is given in Appendix C. Furthermore, for j = 6, • • • ,10, C^{xi,^i) are 

polynomials in f,i and derivatives of g at Xi and 




E 

cf(x,6)' 


/ 



\ 21 

E 

(e 

cr(x,6)|x) 

) 


0forj = 5,--- ,9, 


= -2E 


-fM \2 


tfM / 


(47) 

(48) 


In addition, there exists a sequence of sets {F^ C d G N*} such that limd_>.+oo df/^-ndiiF^Y) = 
0 and for j = 6, - ■ ■ ,10 


and 


Finally, 


with 


lim d sup E 

d —^“)~oo 




d 

E 

1=2 


C,-(xf,e.)-E Cf(X,e*) 


= 0 , 


lim sup E 

d^+oo 


d 


i=2 


= 0 . 


lim sup E 

d^+oo 



= 0 , 


(49) 


(50) 

(51) 


d 

1=2 


7r{yf^)q^iyf^,xY 

Tr{xi)q^{xi,yf^) 




Proof. The Taylor expansion was computed using the computational software Mathematica 
[27]. Then, since just odd powers of occur in C 5 , Cj and Cg, we deduce (47) for j = 5, 7,9. 
Furthermore by explicit calculation, the anti-derivative in xi of C^{xi,^i) , for 

j = 6 , 8 , and [C|^^(xi, 6 )^ + 2 Cf^(xi, 6 )] are on the form of some polynomials in 

the derivatives of g in xi times Therefore, Assumption 2-(3) implies (47) for j = 6,8 

and (48). We now build the sequence of sets FJ, which satisfies the claimed properties. 

Denote for j = 6 , • • • , 10 and Xi G M, Cj^(xj) = E C^{xi, ^i) and V^(xj) = Var C^{xi,^i) 

which are bounded by a polynomial Pi in Xi by Assumption 2-(2) since C^{xi,f,i) are poly¬ 
nomials in and the derivatives of g at x*. Therefore for all A: G N*, 


E 


Cf^(X) 


-hE 


Vf^(X) 


< -I- 00 . 


(52) 


Consider for all j = 6, • • • , 10, the sequence of sets Fj 
where 


d,j 


P'd,j,i = {x^ 


^dj,2 ={xG 


^Cf(x,)-E 

i=2 

d 

^Vf^(X) -E 


defined by Fj - = Fl- ^^ n Fj -^^ 

(53) 

(54) 


i=2 
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Note that linid^+oo = 0 for all j = 6 • • • 10, is implied by lim^^+oo d^^^^d{{Flj^iT) 

0 and limrf^+oo = 0- Let {Xi,z > 2} be a sequence of i.i.d. random variables 

with distribution tti. By definition of F^j the Markov inequality and independence, we get 


d^/\di{Flj^ir) < d-2Vi0iE 


■/ d 



^Cf(X,)-E 

Cj^(X)' 


\i=2 


/ _ 


d 

< E E 

21 , 22=2 

< d"Lio ]E 


^fM/ 

'^3 


Cf^(XiJ-E Cf^(X) ) (C^(Xi,)-E Cf^(X) 


^3 


^fM/ 

'"j 


C“(X)-E C “(X) 


(55) 


where we have used the Young inequality for the last line. On another hand, using the 
Chebyshev and Holder inequality, we get 



/ ^ 

\ 

d^/^ndUFl^^^r) < 

(^^Vf(X,)-E[vf(X)Jj 

< rf-Lio E 

(^vj^(x)-E vj^(x) y 



(56) 


Therefore (52), (55) and (56) imply that linid^^oo d^^^T^dHF^^Y) = 0 for all j = 6, • • • , 10. 
In addition, for all x G Ly the triangle inequality and the Cauchy-Schwarz inequality we 
have for all j = 6, • • • , 10 


E 


^Cf(xi, 6 )-E[cf (X,e*) 


i=2 


< 


+ 


^Vf(xi)-E[vf (X) 
11/2 


i=2 


1/2 


vf (X) 


+ 


j;cf(xi)-E Cf(X,^,) 


i=2 


Therefore by this inequality, (53) and (54), there exists a constant Mi such that 

d 


sup E 


x£F} ■ 
d,3 


^Cf(x„ei)-IE Of(X,e*) 


i=2 


< d-^/^^Mi 


and (49) follows. It remains to show (50). By definition, Cn is the remainder in the eleventh 
order expansion in ad := y/hd given by (46) of the function 0 defined by Q{xi,^i,ad) = 
log(7ri(y?^)g^(yl^, Xj)) — log(7ri(xj)gf^(xj, yf^)). Therefore, by the mean-value form of the 
remainder, there exists Ud G [0, ad] such that 

Qii 0 

Cu{xi,Ci,hd) = {aj^/{ll\))-^^{xi,^i,Ud) . 

By Assumption 2-(l) which implies that g" is bounded, and Assumption 2-(2), for all Ud G 
[0 ,crrf], the eleventh derivative of 0 with respect to ad, taken in {xi,^i,Ud), can be bounded 
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by a positive polynomial in {xi,^i) on the form P2(3;*)P3(^*)- Hence, there exists a constant 
M 2 such that 


E[\Cn{xu^i,hd)\] < M 2 p2(^.) . 


(57) 


And if we define 


= < a; G 


^P2 (x,)-E[P 2(X)] 


i=2 


<d' 


then we have by the Chebychev inequality, this definition and (57) 


<Ye.v[¥2{X)]d-^l^ 

d 

sup ^E[\Cu{xi,^i,hd)\] < M2(E[P2 (x)] + l)(i“^/^° . 
i=2 


These results, combined with Assumption 2-(3), imply limrf_^+c« rf^^^7rd((-Piii)‘^) = 0 and 
(50). Finally, Fj = 0^=6 ^dj satisfies the claimed properties of the Lemma, and (51) directly 
follows from all the previous results. □ 


To isolate the first component of the process T'^’^, we consider the modified generators 
defined for V’ G Cc(M'^) and x G by 


Af^ix) = rfi/^E - i;ix))a^,^,ix,y^) 


where for all x,y G 


a 


fM 


iAx,y) = 


i=2 


'^1 (aJi)iZi,fM(aJi) Vi) 


The next lemma shows that we can approximate by Aj)^, and thus, in essence, the first 
component becomes “asymptotically independent” from the others. 

Theorem A.2. There exists a sequenee of sets {TJ C d G N*} sueh that lim^^+oo d^/^vrd((F’J)‘^) 

0 and for all if G (^“(M) (seen as function of for all d whieh only depends on the first 
component): 


In addition, 


lim sup 


lim sup d^/^E 


ATY{x)-ATY{x 


aT{x,y^)-a^YUx,yn 


fM 


= 0 


.fM' 


= 0 . 


(58) 


Proof. Using that if is bounded and the Jensen inequality, there exists a constant Mi such 
that 


ATYix) - A7V(x) 


fM„ 


< Mid^F-E 


af(x,f/f^)-a(i%(x,y™) 


fM 


..fM' 


Thus it suffices to show (58). Set = y/hd- Since t i-G 1 A exp(t) is 1-Lipschtz on M and, by 
definition we have 


d^/^E [ a^{x,y^) - a^,d{x,y^) 1 < d^^^E [|0(xi,cJd)|] 


(59) 
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where 0(xi,6,o-d) = ^og{TTi{yf^)qf^{yf^, xi)) - log(7ri(xi)g^(xi, By a fifth order 

Taylor expansion of 0 in ad, and since by (46) Q{xi, ^i,0)/(da^^) = 0 for j = 0 • • • 4, we 
have 

d^Q 

Q{xi,Ci,(^d) = -^{xi,(i,Ud){al/5\) , 

^ d 

for some Ud E [0, ad]- Using Assumption 2-(l)-(2), and an explicit expression of d^Q{xi,^i,Ud)/{da^j), 
there exists two positive polynomials Pi and P 2 such that 

|0(a^i,?i,o-<i)| < ('7d/5!)Pi(a^i)P2(6) • 

Plugging this result in (59) and since a^ = , we get 




aT{x,y^)-a^,A^,y^) 


<£p2^-3A0p^(xi). 


Setting F| = {x G ; Pi(xi) < we have 


sup dP^E 
x^Fj 


aT{x,A^^)-AAlAx,yn 


< £p2rf-P5 , 


and (58) follows. Finally, satisfied \mid-^+ood^^^Fd{{F‘^Y) = 0 since by the Markov 
inequality 

A/^Fd{{Fjr) < d-Pi^E [Pi(X)3] , 

where E [Pi(X)^] is finite by Assumption 2-(3). □ 


Lemma A. 3. For all Y ^ C[ 


r( 


lim sup 

d-^ + 00 3;j^gR 


dP^E 


YivT) - '^{xi) - (£^/2)(V^'(xi)/(xi) + Y''{xi)) 


= 0 . 


Proof. Consider ad = '/Ki and lU(xi,^ 1 ,fj^) = YiUi^)- Note that iy(xi,^i,0) = Yixi). 
Then using that Y G (^“(M), a third order Taylor expansion of this function in ad implies 
there exists Ud G [0, hd\ and Mi > 0 such that 


¥.[W{xi,ii,ad)-i!{xi)\ = {fd P®/2)(V’'(xi)/(xi) + V'"(xi))+ Mid 

^ 1 3 

+ -^{.xi,fi,Ud)ad - 
^ d 

Moreover since if G (^^^(M), the third partial derivative of W in ad are bounded for all xi, 
and ad- Therefore there exists M 2 > 0 such that for all xi G M, 


dP^E U(yf^) - V^(xi)l - {f/2){A{xi)f{xi) + A\xi)) < M 2 ^P 2 ^-Pi° 


which concludes the proof. 
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As in [21], we prove a uniform central limit theorem for the sequence of random variables 
defined for i > 2 and Xj G M by Define now for d > 2 and x G 

d 

Mrf(x) = , 

i=2 

and the characteristic function of for t G M by 

Finally define the characteristic function of the zero-mean Gaussian distribution with stan¬ 
dard deviation i^K^, given in Lemma A.l, by: for t G M, 

Lemma A.4. There exists a sequence of set {F| C d G N*}, satisfying limrf_^_|_oo 
0 and we have the following properties: 

(i) for all t gR, limd_>.+oo sup^^g^s \(pd{x,t) - (p{t)\ = 0, 

(a) for all bounded continuous function b : M —)• M, 


lim sup 

xeFl 


E [b (Mrf(x))] - (27 i£^°(A^)2)-V2 




= 0 . 


In particular, we have 


lim sup 

d-^+oo p3 


E 




- 2^{£^K^/2) 


= 0 


Proof. We first dehne for all d > 1, Ff ~ ^d 2 where 


Fii= n 

J=2,4 I 

F|2 = {x G M''; E 


a 

d-i J^E [C5^(X„C*)^'] -E [c5^(Xi,6y 


i=2 


<d3/4 ViG{2,-.. ,d}} . 


< d"^/^ \ , (60) 


(61) 


It follows from (52), and the Chebychev and Markov inequalities that there exists a constant 
M such that 7rrf((F|^)'^) + 'Kd{{Ff 2 Y) < Mdr^!"^. Therefore lim^^+oo d^/®7rd((-F|)'^) = 0. 

(i). Let t G M and x G F| and denote 

V(xi) = Var[C5^(xi, eO] = E [C5^(xi, C*)'] , 

where the second equality follows from Lemma A.l. By the triangle inequality 


\ipd{x,t) - ip{t)\ < 


Td{x,t) - n (1 


i=2 


e^v{xi)Y 


+ 


J] (i - iDTiE) _ 

i=2 ^ 2d J 


(62) 
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We bound the two terms of the right hand side separately. Note that by independence for all 
d, ipd{x,t) = nf=2Since X G by (61), for d large enough < 1 

for all i G {2, • • • , d}. Thus, by [3, Eq. 26.5], we have for such large d, alH G {2, • • • , d} and 
all (5 > 0: 


(pi{xi,t/Vd) - ( 1 - 




< E 


< E 


Q(p/2 






Q(p/2 




1 


{\ci^(xi,ii)\<sdy^} 


< 


+ E 

6d 


^ ^5 {Xi,l;i) J^{|c“(a:i,5i)|>5di/2} 


E 


ci^{x^,c^r 


+ 


(52^2 


E 


ci^ix^,c^r 


In addition, by [3, Lemma 1, Section 27] and using this result we get: 


-n(‘— 


< 


E 


i=2 


6d 


E 


cr\xi,^^y 


+ 


(52^2 


ci^{x^,^^r 


< (^E 

+ fE 




where the last inequality follows from x G and (60) Let now e > 0, and choose d small 
enough such that the fist term is smaller than e/2. Then there exists do £ N* such that for 
all d > do, the second term is smaller than e/2 as well. Therefore, for d > do we get 


sup 

xeFi 


^d{x,t) - n ( ^ 


i=2 


£^^Y{xi)F 

2d 


< e . 


Consider now the second term of (62), by the triangle inequality, 

d / /.iriTr/ \,2' 




i=2 


2d 


< 


no 


i=2 


i^^Y{Xi)t^ 

2d 


-n= 


/{2d) 


i=2 


+ 


H' 

i=2 


/(2d) _ /2 


( 63 ) 


We deal with the two terms separatly. First since for all x^, V(xj) > 0, we have 
1 - V(xi)^^°tV(2d) - e-V(^d^^°tV(2rf) < . 
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Using this result, [3, Lemma 1, Section 27] and the Cauchy-Schwarz inequality, it follows: 
d 


n 1 


i=2 


2d 


-n= 

i=2 


/{2d) 


< ^ |l - Y{xi)i^^t^/{2d) - 

i=2 

d 


i=2 




(64) 


where the last inequality is implied by (60). Finally since on R_, n e“ is 1-Lipschitz and 
using (60), we get 


n 

1=2 


^-i^°V{xi)t^l(2d) _ g-£lO(ii'fM)2j2/2 


< (t".^^72) 


Y,d-^\{xi) - [K 


fM\2 


i=2 


Therefore, combining (64) and (65) in (63), we get: 


(65) 


lim sup 


i=2 ^ 2d J 


= 0 , 


which concludes the proof of (i). 

(ii) Let b : M —)• M be a bounded continuous function. Consider the sequence {x‘^ , d G N*} 
of elements of Tj which satisfies for all d G N*, 


sup 

y€Fl 


E[b(M,(y))] -(27i70(iL^)2)-V2 [ b(u)e-“^/(2U0(xfM)2)^^ 


< 


E 


bfM7x7)l -(27t7°(iL^)2)-V2 f b(u)e-“^/(2^“(^^")^)d 

Jr 


+ d“^ . (66) 


Then using (i) and Levy’s continuity theorem, we get 


lim 

d^-\-oo 


E 


bfM,(x7)l -(27i77iLf^)2)-V2 / b(u)e-“'/(2^^“(^“)^)dn 


= 0 . 


This limit and (66) conclude the proof. 


□ 


proof of Theorem 3.1. The theorem follows from Lemma A.l, (58) in Theorem A.2 and the 
last statement in Lemma A.4. □ 


proof of Theorem 3.2. Consider Fd = 0^=1 2 3 sets Tj are given resp. in Lemma 

A.l Theorem A.2 and Lemma A.4. We then obtain limrf_,.+oo d~^^^'Ffi{{FfiY) = 0 and by the 
union bound, for all T > 0, 


lim P 

d—>-+oo 


rf^ G Td, Vs G [0,r] 


= 1. 
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Furthermore, combining the former results with Lemma A.3, we have for all G 
(seen as a function of the first component): 


lim sup 

d^+oo xeFd, 






= 0 . 


Then, the weak convergence follows from [ 6 , Corollary 8.7, Chapter 4], 


□ 


B Postponed proofs 

B.l Proof of Lemma 4.1 

By Assumption 3-4, tt and q are positive and continuous. It follows from [16, Lemma 1.2] 
that P is Leb'^-irreducible aperiodic, where Leb'^ is the Lebesgue measure on M'^. In addition, 
all compact set C such that Leb'^(C) > 0 are small for P. Now by [17, Theorem 15.0.1], we 
just need to check the drift condition (20). But by a simple calculation, using a{x,y) < 1 for 
all x, y G and the Cauchy-Schwarz inequality, we get 

PC(x) < 1 + ||x|p + (||//(a:)|p - ||xf) [ a{x,y)q{x,y)dy 

+ {2nr‘‘'H2 II(*(I)II IISWII + liswit) [ maxdIJIP , . 

Jmd 

By (23), limsup|| 3 ,||_,.+co (2 ||l*(a^)|| ||'S'(a^)|| + l|'S'( 2 :)||^) ||a;||~^ = 0. Therefore, using again the 
first inequality of (23) and Assumption 5: 

a{x,y)q{x,y)dy < 1 . 

This concludes the proof of Lemma 4.1. □ 


lim sup PV (x)/F(x) < 1 — (1 — T^) lim inf / 

11x11^.-1-00 l|a;||->-+oo Jh 


B.2 Proof of Theorem 4.2 

We prove this result by contradiction. The strategy of the proof is the following: first, under 
our assumptions, most of the proposed moves by the algorithm has a norm which is greater 
than the current point. However, if P is geometrically ergodic, then it implies a upper 
bound on the rejection probability of the algorithm by some constant strictly smaller than 
1. But combining these facts, we can exhibit a sequence of point {xn,n G N}, such that 
\\m.n^+oo'F{xn) = + 00 . Since we assume that vr is bounded, we have our contradiction. 

If P is geometrically ergodic, then by [25, Theorem 5.1], there exists y > 0 such that for 
almost every x G M'^, 

/ 0 !{x,y)q{x,y)dy >r] , (67) 

jRd- 

and let M > 0 such that 

P[||^|| >M]<r?/2, ( 68 ) 
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where ^ is a standard d-dimensional Gaussian random variable. By (24), there exist d > 0 
such that 


inf \\S(x) ^/r(x)|| ||x|| ^>e ^d 

. 

inf inf ||S'(x)z|| > e(l-|-de/2)“^ . 

{\\x\\>R,}\\z\\=l'' 


Note that we can assume is large enough so that 

e6R,/2 > M . 

Now define for x E ||x|| > R^ 


B(x) = jy € I ||5'(x) —/r(x))|| < m| 


(69) 

(70) 


(71) 


(72) 


Note if y E i?(x), we have by definition and the triangle inequality II5(x) ^y|| > ||5'(x) ^y(x)|| — 
M. Therefore by (69)-(70) and (71) 


||y|| = ||S’(x)5(x) ^y|| > e(l-bde/2) ^ ||5(x) ^y|| 

> e(l + 5e/2)~^ {(^~^ + '^) ll^ll “ > ll^^ll ■ (73) 

We then show that this inequality implies 

liminf inf = 0 . (74) 

||x ||—>-+00 j/g_B(a;) q[X,y) 

Let X E ||x|| > Re, y E B{x). First, it is straightforward by (72), that |5'(x)| q{x,y) is 
uniformly bounded away from 0, and it suffices to consider |5(x)| y(y, x). By (70)-(73), we 
have ||y|| > Re and for all z E M'^, ||5'(y)2|| > e(l + (5e/2)“^ || 2 ||, which implies for all z E 
e“^(l -|-de/2) ||z|| > ||5(y)“^2;||. By this inequality and (69), we have 

|||5’(?/)"V(y)|| - ||5'(y)"^a:||| > ||5’(?/)“V(y)|| - ||5'(y)”^a^|| 

> (e-i + d) ||y|| - e-^(l + 6e/2) ||x|| > (d/2) ||y|| , (75) 


where the last inequality follows from (73). Using this result, the triangle inequality, (75)-(70) 
and (73), we get 


q{y,x) = (27i)"‘^/2exp|-(l/2) ||S’(y)“^(x - y(y))||^ - log(|S’(y)|)| 

< (27i)-'^/2gj^p|_(;L/2) (||5(y)-V(y)|| - ||'5(y)"^a;||)^ -log(|5(y)|)| 

< (271)“'^/^ exp I-(dVs) ||yf - log(|S’(y)|)} 

< (271)“'^/^ exp | —(d^/ 8 ) IVV — dlog(e(l + de/ 2 )“^)| . 

Using this inequality and (25) imply lim|| 3 ,||_^+oo iiifj/esU) \S{x)\ q{y, x) = 0 and then (74). 
Therefore there exists > 0 such that for all x E ||x|| > Rq 


. n q{y,x) 

mf —-- 

y&B(x) q[x,y) 


< rj/A . 


(76) 
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Now we are able to build the sequence {xn-,n E N} such that for all n E N, ||xn,+i|| > 
max(i?e, and lim„^+oo vr(x,i) = +oo. Indeed let xq E such that ||xo|| > m8ix{Re, Rq). 

Assume, we have built the sequence up to the nth term and such that for all A: = 0,..., n — 1, 
||xfc+i|| > msix{Re,Rq) and n^Xk+i) > (3/2)7r(xfc). Now we choose Xn+i depending on Xn, 
satisfying 7r(xn+i) > (3/2)7r(xn) and ||xn+i|| > max(Ae,i?g). Since ||xn|| > m.ax{Rf:, Rq), by 
(67)-(68) and (76) 


T]< a{xn,y)qixn, y)dy < r//2 + 

7r(y) 


mm 


< ri/2 + (r//4) 


B{xn) 


B(Xn) 

q{xn,y)dy . 


L T^{y)q{y,xn) \ 

V ' 7r{Xn)q{Xn,y)J 


q{xn,y)dy 


This inequality implies that :^^^q{xn,y)dy > 2 and therefore there exists Xn+i E 

-B(xn) such that 7r(xn+i) > (3/2)7r(x„,), and since x„,+i E i?(xn) by (73), ||x,i+i|| > max(Ae,i2g) 
Therefore, we have a sequence {xn,n E N} such that for all n E N, 7r(xn+i) > (3/2)7r(xn). 
Since by assumption 7r(xo) > 0, we get lim„_j,+oo 7r(xn) = +oo, which contradicts the as¬ 
sumption that TT is bounded. This concludes the proof of Theorem 4.2. □ 


C Expressions of 


+ '^5^i9^‘^Hxi)9' (xi) + ^0Cf9^^Hxi)g" (xi) 

+1065^^^ (xi)/(xi) + {xi)9'{xif + 35(ig' {xi)g" {xif'^ 


Cr°(xuSl) = 1539(‘‘)(1.,V(I,) 

1 9Q 7 

+ -^Ci9^'^\xi)g'ixi) + —^?5®(xi)/(xi) - —Ci5®(3;i)/(xi) 

+ ^^i9'^^Hxi)9'ixif + ^^ig'{xi)g"{xif^ . 

+ ^65^^^(a;i)ff'(a;i) + ^Cl9^^Hxi)g"{xi) - ^^ig^^\xi)g"{xi) 

+^Ci9^^Hxi)g'{xif + ^^ig'{xi)g"{xif^ . 
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+ ^^i9*'^’(xi)9'(xi) + :^a3^i9*^*(xi)9"(xi) + ^a4$i9‘^Hxi)9"(xi) 

48 72 0 

- \aliig^^\xi)g''{xi) + -^Cf9^^Hxi)g'\xi) + (xi)c/"(xi) 

+ ^^i9^^Hxi)g'{xif - ^a‘l^ig'{xi)g"{xif + ^al^ig'{xi)g''{xif 

+ ^^i9'{xi)g''ixif 


D Expressions of K* 


We provide here the expressions of the quantities K* involved in Theorems 3.1, 3.2, 3.4 
Let X be a random variable distributed according to vri. 


=E 


79g(^Hx)^ , llg(4)(x)2g'(X)2 ^ 77g^^') (X^g'^iX) 


+ 


17280 

?9"{^r 

20736 ' 576' 


+ 


1152 2592 

+ + ^9^^H^)9^^K^)9'{^) + ^9("Hx)ff(5)(x)/(X) 


+ ^g(^){Xfg'{X)^ 


864^ 

^2 


+ ^5^3)(x)5(5)(x)5'(x)2 + 

+ ^5^'^(X)5(")(X)5'(X)3 + J-g(-^\x)g'{X)^g"{X)^ + 


864^ 


1728 


+ X9'='(X)2<,'(X)2<,"(X) + + ^9'’>(X)9<‘‘I(X)9'(X)9"(X) 


„„0^„ 799<-^>(X)" llg(-‘)(X)^g'(X)^ 15679W(X)^g"(X)^ 

17280 1152 3456 

+ + j^9'iXyg"{Xy + ^5^"^(X)5(')(X)5'(X) 

+ 3^5(')(X)5(^)(X)/(X) + ^5®(X)ff(^Hx)ff'(X)2 
+ ^ff(5)(x)5'(X)/(X)2 + ^5(3 )(x)5(4)(X)5'(X)3+ 
^<7^4nX)9'(X)2/(X)2 + ^g(^\X)g'{Xfg"{Xf 
+ ^ 9 ^^\xfg'{Xyg"{X) + ^g(^\X)g'{X)g" {Xf + 
^g^^\X)g(^\X)g\X)g'\X) . 


= E 


^g'iXfg''{Xyai + ^g" (X)^ g^^\xf aj 


, 3.5. 
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+ -ff'(X)/(X)35(3)(X)a^ - -alg'iXfg"iX)^al + -g'{Xf g"{X)^al 
+ ^^g"{Xfg^^\xfal + ^a3/(X)25(3) (X)2a2 
+ ^g'{Xfg"{X)g^^\xfal - ^alg'{X)g"{Xf g^^\X)al 
+ ^g'{X)g"{Xfg^^){X)al + ^a,g'{X)g"{Xfg^^) {X)al 
+ ^g'{Xfg"{Xfg^^\X)al + ^g'{Xf g" {Xf g^^\X)al 
+ ^g'{X)g''{X)g^^\X)g^^\X)al + ^g'{X)g" {Xf g^^\X)al 
+ ^g"{X)g^^Hx)g^^\x)al + ^a\g'{Xf g” {Xf 

- :^a\g'{Xfg"{Xf + ^^g'{xfg"{Xf + [xf g^^\xf 

alg''{Xfg^^\Xf 1 „.^,2 . 3 ) ,^,2 

7V'(X)y»(X)3 ^ 

+ —a3g'(X)2/(X)ff(3)(x)2 + (X)2gW(X)2 

gg^ V y y V yy v y ^^^2 

79g(^)(X)2 _ ]_ 2 i^x)g"{Xfg^^\X) 

17280 gg ly V yy V y y v y 
+ l5'(X)/(X)35(3)(X) - ^a?a35'(X)/(X)35(3)(X) 

+ ^a3g'(X)/(X)35(3)(X) - ^a25'(X)3/(X)25(3)(X) 

+ ^5'(X)3/(X)25(3)(X) - ■^^aig'{Xfg''{Xfg^^\x) 

+ ■^^g'{Xfg"{Xfg^^\X) + ■^^g'{Xfg^^\X)g^^Hx) 

+ ^g'{X)g"{X)g^^\X)g^^\X) + ^a35'(X)/(X)ff(3) (x)^^ (X) 

- ^a?5'(X)/(X)25(5)(X) + ^5'(X)/(X)25(5)(X)+ 

^g'{Xfg^^\X)g^^\X) + ^g" {X)g^^\X)g^^\X) 

+ ^^a^g"{X)g^^\X)g^^\X) + -^g'{X)g^^\X)g^^\X) . 
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